65 research outputs found

    X-som: A flexible ontology mapper

    Get PDF
    System interoperability is a well known issue, especially for heterogeneous information systems, where ontologybased representations may support automatic and usertransparent integration. In this paper we present X-SOM: an ontology mapping and integration tool. The contribution of our tool is a modular and extensible architecture that automatically combines several matching techniques by means of a neural network, performing also ontology debugging to avoid inconsistencies. Besides describing the tool components, we discuss the prototype implementation, which has been tested against the OAEI 2006 benchmark with promising results.

    No Bits Left Behind

    Get PDF
    One of the key tenets of database system design is making efficient use of storage and memory resources. However, existing database system implementations are actually extremely wasteful of such resources; for example, most systems leave a great deal of empty space in tuples, index pages, and data pages, and spend many CPU cycles reading cold records from disk that are never used. In this paper, we identify a number of such sources of waste, and present a series of techniques that limit this waste (e.g., forcing better memory locality for hot data and using empty space in index pages to cache popular tuples) without substantially complicating interfaces or system design. We show that these techniques effectively reduce memory requirements for real scenarios from the Wikipedia database (by up to 17.8×) while increasing query performance (by up to 8×)

    Relational Cloud: The Case for a Database Service

    Get PDF
    In this paper, we make the case for â databases as a serviceâ (DaaS), with two target scenarios in mind: (i) consolidation of data management functionality for large organizations and (ii) outsourcing data management to a cloud-based service provider for small/medium organizations. We analyze the many challenges to be faced, and discuss the design of a database service we are building, called Relational Cloud. The system has been designed from scratch and combines many recent advances and novel solutions. The prototype we present exploits multiple dedicated storage engines, provides high-availability via transparent replication, supports automatic workload partitioning and live data migration, and provides serializable distributed transactions. While the system is still under active development, we are able to present promising initial results that showcase the key features of our system. The tests are based on TPC benchmarks and real-world data from epinions.com, and show our partitioning, scalability and balancing capabilities

    Schism: a Workload-Driven Approach to Database Replication and Partitioning

    Get PDF
    We present Schism, a novel workload-aware approach for database partitioning and replication designed to improve scalability of shared-nothing distributed databases. Because distributed transactions are expensive in OLTP settings (a fact we demonstrate through a series of experiments), our partitioner attempts to minimize the number of distributed transactions, while producing balanced partitions. Schism consists of two phases: i) a workload-driven, graph-based replication/partitioning phase and ii) an explanation and validation phase. The first phase creates a graph with a node per tuple (or group of tuples) and edges between nodes accessed by the same transaction, and then uses a graph partitioner to split the graph into k balanced partitions that minimize the number of cross-partition transactions. The second phase exploits machine learning techniques to find a predicate-based explanation of the partitioning strategy (i.e., a set of range predicates that represent the same replication/partitioning scheme produced by the partitioner). The strengths of Schism are: i) independence from the schema layout, ii) effectiveness on n-to-n relations, typical in social network databases, iii) a unified and fine-grained approach to replication and partitioning. We implemented and tested a prototype of Schism on a wide spectrum of test cases, ranging from classical OLTP workloads (e.g., TPC-C and TPC-E), to more complex scenarios derived from social network websites (e.g., Epinions.com), whose schema contains multiple n-to-n relationships, which are known to be hard to partition. Schism consistently outperforms simple partitioning schemes, and in some cases proves superior to the best known manual partitioning, reducing the cost of distributed transactions up to 30%.Quanta Computer (Firm) (T-Party Project

    Workload-Aware Database Monitoring and Consolidation

    Get PDF
    In most enterprises, databases are deployed on dedicated database servers. Often, these servers are underutilized much of the time. For example, in traces from almost 200 production servers from different organizations, we see an average CPU utilization of less than 4%. This unused capacity can be potentially harnessed to consolidate multiple databases on fewer machines, reducing hardware and operational costs. Virtual machine (VM) technology is one popular way to approach this problem. However, as we demonstrate in this paper, VMs fail to adequately support database consolidation, because databases place a unique and challenging set of demands on hardware resources, which are not well-suited to the assumptions made by VM-based consolidation. Instead, our system for database consolidation, named Kairos, uses novel techniques to measure the hardware requirements of database workloads, as well as models to predict the combined resource utilization of those workloads. We formalize the consolidation problem as a non-linear optimization program, aiming to minimize the number of servers and balance load, while achieving near-zero performance degradation. We compare Kairos against virtual machines, showing up to a factor of 12× higher throughput on a TPC-C-like benchmark. We also tested the effectiveness of our approach on real-world data collected from production servers at Wikia.com, Wikipedia, Second Life, and MIT CSAIL, showing absolute consolidation ratios ranging between 5.5:1 and 17:1

    ERA: A Framework for Economic Resource Allocation for the Cloud

    Full text link
    Cloud computing has reached significant maturity from a systems perspective, but currently deployed solutions rely on rather basic economics mechanisms that yield suboptimal allocation of the costly hardware resources. In this paper we present Economic Resource Allocation (ERA), a complete framework for scheduling and pricing cloud resources, aimed at increasing the efficiency of cloud resources usage by allocating resources according to economic principles. The ERA architecture carefully abstracts the underlying cloud infrastructure, enabling the development of scheduling and pricing algorithms independently of the concrete lower-level cloud infrastructure and independently of its concerns. Specifically, ERA is designed as a flexible layer that can sit on top of any cloud system and interfaces with both the cloud resource manager and with the users who reserve resources to run their jobs. The jobs are scheduled based on prices that are dynamically calculated according to the predicted demand. Additionally, ERA provides a key internal API to pluggable algorithmic modules that include scheduling, pricing and demand prediction. We provide a proof-of-concept software and demonstrate the effectiveness of the architecture by testing ERA over both public and private cloud systems -- Azure Batch of Microsoft and Hadoop/YARN. A broader intent of our work is to foster collaborations between economics and system communities. To that end, we have developed a simulation platform via which economics and system experts can test their algorithmic implementations

    Kaskade: Graph Views for Efficient Graph Analytics

    Full text link
    Graphs are an increasingly popular way to model real-world entities and relationships between them, ranging from social networks to data lineage graphs and biological datasets. Queries over these large graphs often involve expensive subgraph traversals and complex analytical computations. These real-world graphs are often substantially more structured than a generic vertex-and-edge model would suggest, but this insight has remained mostly unexplored by existing graph engines for graph query optimization purposes. Therefore, in this work, we focus on leveraging structural properties of graphs and queries to automatically derive materialized graph views that can dramatically speed up query evaluation. We present KASKADE, the first graph query optimization framework to exploit materialized graph views for query optimization purposes. KASKADE employs a novel constraint-based view enumeration technique that mines constraints from query workloads and graph schemas, and injects them during view enumeration to significantly reduce the search space of views to be considered. Moreover, it introduces a graph view size estimator to pick the most beneficial views to materialize given a query set and to select the best query evaluation plan given a set of materialized views. We evaluate its performance over real-world graphs, including the provenance graph that we maintain at Microsoft to enable auditing, service analytics, and advanced system optimizations. Our results show that KASKADE substantially reduces the effective graph size and yields significant performance speedups (up to 50X), in some cases making otherwise intractable queries possible

    LST-Bench: Benchmarking Log-Structured Tables in the Cloud

    Full text link
    Log-Structured Tables (LSTs), also commonly referred to as table formats, have recently emerged to bring consistency and isolation to object stores. With the separation of compute and storage, object stores have become the go-to for highly scalable and durable storage. However, this comes with its own set of challenges, such as the lack of recovery and concurrency management that traditional database management systems provide. This is where LSTs such as Delta Lake, Apache Iceberg, and Apache Hudi come into play, providing an automatic metadata layer that manages tables defined over object stores, effectively addressing these challenges. A paradigm shift in the design of these systems necessitates the updating of evaluation methodologies. In this paper, we examine the characteristics of LSTs and propose extensions to existing benchmarks, including workload patterns and metrics, to accurately capture their performance. We introduce our framework, LST-Bench, which enables users to execute benchmarks tailored for the evaluation of LSTs. Our evaluation demonstrates how these benchmarks can be utilized to evaluate the performance, efficiency, and stability of LSTs. The code for LST-Bench is open sourced and is available at https://github.com/microsoft/lst-bench/
    • …
    corecore